Goto

Collaborating Authors

 dirichlet distribution


Courtroom Analogy: New Perspective on Uncertainty-Aware Classification

arXiv.org Machine Learning

Single-pass uncertainty quantification (UQ) methods for classification represent uncertainty by predicting a tractable distribution over the class probability vector. While existing approaches primarily focus on enhancing the expressiveness of this distribution, they often provide limited insight into how predictive uncertainty is structured and aggregated, resulting in weak interpretability. We introduce the courtroom analogy, which conceptualizes uncertainty-aware classification as a structured debate among class-specific advocates. Each advocate forms a probabilistic opinion, and a final verdict is reached by aggregating these opinions using input-dependent plausibility weights. In this framework, each advocate's opinion is modeled as a Dirichlet distribution whose concentration parameter is decomposed into shared evidence and class-specific advocacy. This yields a structured mixture of Dirichlet distributions with semantically interpretable parameters. To instantiate this formulation, we propose Mixture of Dirichlet EXperts (MoDEX), a single-pass neural architecture that predicts the courtroom parameters, enabling efficient and expressive UQ while explicitly modeling uncertainty aggregation. We demonstrate that MoDEX enjoys strong theoretical properties and achieves state-of-the-art UQ performance across diverse benchmarks, yielding interpretable uncertainty estimates with meaningful semantics.


Dirichlet-Based Monte Carlo Dropout for Uncertainty Estimation in Neural Networks

arXiv.org Machine Learning

Traditional neural networks provide deterministic predictions without inherent uncertainty estimates. While Bayesian Neural Networks (BNNs) offer a principled approach to uncertainty quantification, their computational complexity limits scalability. Monte Carlo (MC) Dropout, initially introduced as a regularization technique, has been shown to approximate Bayesian inference by enabling probabilistic modeling through multiple stochastic forward passes. In this work, we enhance uncertainty estimation in deep learning by integrating a Dirichlet-based framework within MC Dropout. Specifically, we leverage the formulation proposed by Sensoy et al. (2018), where class probabilities are modeled using a Dirichlet distribution, allowing for a more informative uncertainty representation. The proposed approach maintains the computational efficiency of MC Dropout while improving the quality of uncertainty estimates. We discuss the theoretical foundations of our method and compare it with existing uncertainty quantification techniques. The results highlight the effectiveness of the proposed method in producing well-calibrated uncertainty estimates, offering a practical solution for uncertainty-aware deep learning models.


Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

Neural Information Processing Systems

We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon H with S states, and A actions. The performance of an agent is measured by the regret after interacting with the environment for T episodes. We propose an optimistic posterior sampling algorithm for reinforcement learning (OPSRL), a simple variant of posterior sampling that only needs a number of posterior samples logarithmic in H, S, A, and T per state-action pair.



08f90c1a417155361a5c4b8d297e0d78-Supplemental.pdf

Neural Information Processing Systems

Now consider a perturbation of the prior distribution over transition functions ฮด: T R 0 such that R Tp ฮด(Tp)P(Tp|h0)dTp = 1. Proof: Proposition 2 directly extends Proposition 1 in [8] to BAMDPs. Therefore, the perturbed distribution over histories is also a valid probability distribution. Provided that cbo is chosen appropriately (details in the appendix), as the number of perturbations expanded approaches, a perturbation within any > 0 of the optimal perturbation will be expanded by the Bayesian optimisation procedure with probability 1 ฮด. Proof: Consider an adversary decision node, v, associated with augmented state (s,ha,y) in the BACVaR-SG. We begin by proving that Q((s,ha,y),ฮพ) is continuous with respect to ฮพ. Define a function d: S R, such that ฮพ + d produces a valid adversary perturbation.


Details

Neural Information Processing Systems

To keep experiments uniform, for all datasets (STL-10, CIFAR-10, and CIFAR-100) we used a train/val/test partitioning. In our experiments we compared FED with four baselines. For all baselines we tried different learning rates [0.1, 0.01, 0.001] and batch sizes [32, 64, 100]. For EnDD and EnDD + AUX, we used the same temperature, temperature annealing, and optimizer that was used in the original paper. For AMT, we tried different alphas [1e1, 1e3, 1e5] and kept the rest as the original paper.


Functional Ensemble Distillation

Neural Information Processing Systems

Bayesian models have many desirable properties, most notable is their ability to generalize from limited data and to properly estimate the uncertainty in their predictions. However, these benefits come at a steep computational cost as Bayesian inference, in most cases, is computationally intractable. One popular approach to alleviate this problem is using a Monte-Carlo estimation with an ensemble of models sampled from the posterior. However, this approach still comes at a significant computational cost, as one needs to store and run multiple models at test time. In this work, we investigate how to best distill an ensemble's predictions using an efficient model.



Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound

Neural Information Processing Systems

We investigate a stochastic counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties. While our approach holds for arbitrary distributions, we instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk, which then turns the generalization bound into a tractable training objective. The resulting stochastic majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight generalization bounds, in a series of numerical experiments when compared to competing algorithms which also minimize PACBayes objectives - both with uninformed (data-independent) and informed (datadependent) priors.


Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification

arXiv.org Machine Learning

Neural network classifiers trained with cross-entropy loss achieve strong predictive accuracy but lack the capability to provide inherent predictive uncertainty estimates, thus requiring external techniques to obtain these estimates. In addition, softmax scores for the true class can vary substantially across independent training runs, which limits the reliability of uncertainty-based decisions in downstream tasks. Evidential Deep Learning aims to address these limitations by producing uncertainty estimates in a single pass, but evidential training is highly sensitive to design choices including loss formulation, prior regularization, and activation functions. Therefore, this work introduces an alternative Dirichlet parameter estimation strategy by applying a method of moments estimator to ensembles of softmax outputs, with an optional maximum-likelihood refinement step. This ensemble-based construction decouples uncertainty estimation from the fragile evidential loss design while also mitigating the variability of single-run cross-entropy training, producing explicit Dirichlet predictive distributions. Across multiple datasets, we show that the improved stability and predictive uncertainty behavior of these ensemble-derived Dirichlet estimates translate into stronger performance in downstream uncertainty-guided applications such as prediction confidence scoring and selective classification.